ARIC - 2013 - Annual activity report

ARIC

ARIC - 2013

Project-Team Aric

Members

Overall Objectives

Research Program

Application Domains

Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Publications of the year

Previous |

Home | Next next

Section: Research Program

Function evaluation

Towards automatic design of function programs or circuits

Concerning function evaluation, what we have successfully automated so far is program generation for statically defined elementary functions of one real variable. These techniques will certainly need refining as we try to apply them to more functions, in particular to special functions, or to compound functions. To apply these techniques to arbitrary code at compile time further leads to several challenges. The first one is to identify a relevant function in a program, along with the useful information that will allow to implement it efficiently: range of the input values, needed output accuracy and/or rounding mode, etc. It requires interaction with compilation people working on classical compilers.

A second challenge then is to analyze such a function automatically, which typically implies the following tasks:

Compute maximal-width subranges on which the output is trivial, that is, requires no computation at all (zero, infinity, etc.). This can be extremely tedious and error-prone if done by hand. An important side effect of this step is to generate test vectors for corner cases.
Identify properties answering to questions like “Is further range reduction possible?”, “Are there floating-point midpoints?”, “Can overflow occur?”, etc. Today, this is essentially handwritten by experts, so the challenge is to automate it.
Investigate automating range reduction. Generic reduction schemes can be used (for example, interval splitting into sub-intervals) but involve many parameters, and we have to model the associated cost/performance/accuracy tradeoffs. More function-specific range reduction steps can be an outcome of the previous analysis steps.

This general automation process will be progressively set up and refined by working on concrete implementations of a significant set of operators, hopefully driven by applications and through industrial collaborations: all the C99 elementary functions, other functions such as the inverse cumulative distribution functions used in random number generators, variations around the Euclidean norm such as $x / \sqrt{x^{2} + y^{2}}$ , complex arithmetic, interval operators, FFTs, etc.

Most of the software we design brings in some floating-point functionality. We currently target two main types of processors:

Embedded processors, with or without a floating-point unit (FPU), for which we need to implement the basic operators, coarser elementary functions, and compile-time arbitrary functions. On such targets, memory may be limited, and power consumption is important.
General-purpose processors that do possess an FPU for the basic operations. The challenge is then to use this FPU to its best to implement coarser operators and compile-time functions. The main metrics here are performance and, to a lesser extent, code size.

Mathematical tools for function evaluation

Challenges in function approximation

The algorithms currently implemented in the Sollya toolbox (see Section 5.1 ) provide, for functions of one variable, an end-to-end solution for finding near-optimal polynomial approximations whose coefficients are machine numbers. This includes a validated tight bound of the approximation error. We now want to generalize them in three main directions:

We first want to design and implement algorithms for computing multivariate polynomials that approximate very well a given multivariate function.
We also want to address the approximation of a function by a rational function. This is important for the software implementation of functions with poles. It would also make practical a hardware-oriented technique called the E-method.
Currently, our algorithms and their implementations use a basis of the form $(x^{i})$ where $i$ belongs to a finite subset of $ℕ$ with a cardinality bounded by, say, 100. We now aim at dealing with other bases. In particular, the classical basis of Chebyshev polynomials should lead to a better numerical analysis without losing any efficiency during evaluation. In general, we should be able to deal with any basis made of orthogonal polynomials. Using as basis the trigonometric polynomials should lead to efficient finite-precision digital filters.

In these three directions, we eventually want to constraint coefficients to be machine numbers.

Approximation for digital filters

A digital filter implements a given transfer function, either as a polynomial (finite impulse response, or FIR filter) or as a rational function (infinite impulse response, or IIR filter). Classical techniques and toolboxes exist for computing such filters, but they amount to computing infinite precision solutions and rounding them. This rounding turns out to be numerically unstable in some situations. We intend to study to what extent an improved rounding procedure might improve the filter, in terms of efficiency (software implementation) or size (hardware implementation).

Collaborations on this subject have begun with researchers from the signal processing community.

Challenges in the search for hard-to-round cases

For a given function $f$ and a given floating-point format, the “hardest to round” (HR) points are the floating-point numbers $x$ such that $f (x)$ is nearest to a value where the rounding function changes. Knowing these HR points makes it possible to design efficient programs that, given a floating-point number $y$ , always return the floating-point number nearest $f (y)$ . Such programs are called “correctly rounded” implementations of $f$ .

We have obtained and published HR points in the binary64 (“double precision”) and decimal64 formats of the IEEE 754-2008 standard for the most important functions of the standard mathematical library. However, in this line of research, we now have to tackle difficult challenges. First, our current methods for finding HR-points cannot be used in big precisions such as the binary128 (“quad precision”) and decimal128 (128-bit decimal) formats of the IEEE 754-2008 standard. Also, the processes that generate our HR cases are based on complex and very long calculations (years of cumulated CPU time) that inevitably cast some doubt on the correctness on their results. Hence, we reconsider the methods used to get HR points, and mainly focus on three aspects:

big precisions: we must get new algorithms for dealing with precisions larger than double precision. Such precisions will become more and more important (even if double precision may be thought of as more than enough for a final result, it may not be sufficient for the intermediate results of a long or critical calculation);
formal proof: we must provide formal proofs of the critical parts of our methods. Another possibility is to have our programs generating certificates that show the validity of their results. We should then focus on proving the certificates;
aggressive computing: the methods we have designed for generating HR points in double precision require weeks of computation on hundreds of PCs. Even if we design faster algorithms, we must massively parallelize our methods, and study various ways of doing that.

These three aspects have been at the core of our TaMaDi ANR project (see Section 8.1.2 ): the project brought significant progress, but there is still much to be done.

Previous |

Home | Next next